Rabin-Karp String Matching Demo - Interactive Tutorial

📝 Problem Statement

Find all occurrences of a Pattern ($P$) in a Text ($T$). Like the Naive algorithm, we allow **overlapping matches** and advance the pattern by **only 1 index** after a full or partial comparison.

Rabin-Karp introduces **hashing** to quickly filter out non-matching windows in the text, performing character-by-character comparisons only when hash values match.

Input Section

Text (T): Pattern (P):

Initial Visualization:

💡 Rabin-Karp Algorithm Logic

Key Idea: Hashing Substrings

Instead of direct character comparisons for every position, Rabin-Karp calculates a **hash value** for the pattern and for each window of text of the same length as the pattern.

Algorithm Steps:

**Choose Parameters**: Select a base ($d$) and a large prime modulus ($q$). A common choice for $d$ is the number of characters in the alphabet (e.g., 256 for ASCII).
**Precompute Pattern Hash**: Calculate $h_P = \text{hash}(P)$.
**Compute First Text Window Hash**: Calculate $h_{T_0} = \text{hash}(T[0 \dots M-1])$.
**Iterate and Compare Hashes**:
- For each text window $T[i \dots i+M-1]$:
  - **If $h_P == h_{T_i}$**: A potential match. Perform a **character-by-character check** (called a "spurious hit check") to confirm. If it's a true match, record index $i$.
  - **If $h_P \neq h_{T_i}$**: A guaranteed mismatch. No need for character comparisons.
**Rolling Hash**: For the next window $T[i+1 \dots i+M]$, efficiently update $h_{T_{i+1}}$ from $h_{T_i}$ by "subtracting" the leading character $T[i]$ and "adding" the new trailing character $T[i+M]$.
$h_{new} = (d (h_{old} - T[old\_char] \cdot d^{M-1}) + T[new\_char]) \pmod q$
(Where $d^{M-1}$ is precomputed as $h_d = d^{M-1} \pmod q$).

Our demo will use a base $d=10$ (treating characters as digits, 'A'=1, 'B'=2 etc.) and a prime modulus $q=13$. This simplifies visualization, though real-world implementations use larger values for better collision avoidance.

🎬 Step-by-Step Demo

Demo initialized. Click 'Next Step' to begin the hash comparison process.

Visualization

Pattern Hash: ? Window Hash: ?

Output Log

Final Match Indices: None

📊 Algorithm Analysis

Best & Average Case Complexity

O(N + M)

When there are few or no hash collisions (spurious hits). Hash computation for $P$ is $O(M)$. Initial window hash is $O(M)$. Rolling hash for $N-M$ windows is $O(N-M)$. Confirmatory character checks are rare. Total: $O(M) + O(N-M) = O(N+M)$.

Worst Case Complexity

O(N * M)

Occurs when every window of text produces a hash collision with the pattern's hash (many spurious hits), forcing a full character-by-character comparison for almost every shift. E.g., $T=\text{AAAAAAAAAAAA}$, $P=\text{AAAA}$.

Efficiency & Hash Collisions

The efficiency of Rabin-Karp highly depends on the choice of the hash function (base $d$ and prime modulus $q$). A good hash function minimizes collisions, bringing the average case close to $O(N+M)$, which is much faster than Naive's worst-case $O(N \cdot M)$.

Rabin-Karp String Matching Algorithm